Parse Tree Database for Information Extraction
نویسندگان
چکیده
Information extraction systems are traditionally implemented as a pipeline of special-purpose processing modules targeting the extraction of a particular kind of information. A major drawback of such approach is that whenever a new extraction goal emerges or a module is improved, extraction has to be re-applied from scratch to the entire text corpus even though only a small part of the corpus might be affected. In this paper, we describe a novel approach for information extraction so that extraction needs are expressed in the form of database queries, which are evaluated and optimized by databases. Using database queries for information extraction enables generic extraction and minimizes reprocessing of data. In addition, our approach provides two different query generation components that can automatically form database queries for extraction from training datasets, as well as from unlabeled data through a mechanism inspired by the pseudo-relevance feedback approach found in protein-protein interactions and drug-protein-metabolic relations from two sets of corpus. Experiments show that our approach achieves a precision of 83.6% and recall of 58.6% (F-measure of 64.2%) for the extraction of protein-protein interactions from the BioCreative 2 corpus, while achieving a precision of 85.0% and recall of 26.0% (F-measure of 39.8%) for drug-protein-metabolic relations.
منابع مشابه
Efficient Information Retrieval System using Incremental Approach
Information Retrieval Systems [12][19] are traditionally implemented as a pipeline of specialpurpose processing modules targeting the extraction of a particular kind of information. A major drawback of such an approach is that whenever a new extraction goal emerges or a module is improved, extraction has to be reapplied from scratch to the entire text corpus even though only a small part of the...
متن کاملComposite Kernels For Relation Extraction
The automatic extraction of relations between entities expressed in natural language text is an important problem for IR and text understanding. In this paper we show how different kernels for parse trees can be combined to improve the relation extraction quality. On a public benchmark dataset the combination of a kernel for phrase grammar parse trees and for dependency parse trees outperforms ...
متن کاملExploring syntactic structured features over parse trees for relation extraction using kernel methods
Extracting semantic relationships between entities from text documents is challenging in information extraction and important for deep information processing and management. This paper proposes to use the convolution kernel over parse trees together with support vector machines to model syntactic structured information for relation extraction. Compared with linear kernels, tree kernels can effe...
متن کاملExtracting Causal Knowledge from a Medical Database Using Graphical Patterns
This paper reports the first part of a project that aims to develop a knowledge extraction and knowledge discovery system that extracts causal knowledge from textual databases. In this initial study, we develop a method to identify and extract cause-effect information that is explicitly expressed in medical abstracts in the Medline database. A set of graphical patterns were constructed that ind...
متن کاملExploiting Constituent Dependencies for Tree Kernel-Based Semantic Relation Extraction
This paper proposes a new approach to dynamically determine the tree span for tree kernel-based semantic relation extraction. It exploits constituent dependencies to keep the nodes and their head children along the path connecting the two entities, while removing the noisy information from the syntactic parse tree, eventually leading to a dynamic syntactic parse tree. This paper also explores e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010